[OpenCL] Change of OpenCL profiling logic #11180

argrento · 2022-04-29T11:46:58Z

Profiling in TVM is enabled or disable in compile time by the USE_PROFILER switch. It means that if we enable profile in config.cmake, but do not use any profiling features in the app, OpenCL is forced to collect cl_events objects.

Build TVM with set(USE_PROFILER ON).
Consider simple app, where we create module from the .so file:

tvm::runtime::Module mod_factory = tvm::runtime::Module::LoadFromFile("model.so");
tvm::runtime::Module gmod = mod_factory.GetFunction("default")(ctx);
tvm::runtime::PackedFunc set_input = gmod.GetFunction("set_input");
tvm::runtime::PackedFunc get_input = gmod.GetFunction("get_input");
tvm::runtime::PackedFunc get_output = gmod.GetFunction("get_output");
tvm::runtime::PackedFunc run = gmod.GetFunction("run");

// set inputs and outputs

size_t niterations = 5000;
for (size_t i = 0; i < niterations; i++) {
  run();
}

Then we collect memory usage info with Valgrind.

    MB
818.5^                                                                       #
     |                                                                    @@@#
     |                                                                @@@@@@@#
     |                                                            @@@@@@@@@@@#
     |                                                         @@@@@@@@@@@@@@#
     |                                                     ::::@@@@@@@@@@@@@@#
     |                                                  @:::: :@@@@@@@@@@@@@@#
     |                                             @@@:@@:::: :@@@@@@@@@@@@@@#
     |                                           @@@ @:@@:::: :@@@@@@@@@@@@@@#
     |                                      :@@@@@@@ @:@@:::: :@@@@@@@@@@@@@@#
     |                                  :@@::@@@ @@@ @:@@:::: :@@@@@@@@@@@@@@#
     |                               ::::@ ::@@@ @@@ @:@@:::: :@@@@@@@@@@@@@@#
     |                           :@@@: ::@ ::@@@ @@@ @:@@:::: :@@@@@@@@@@@@@@#
     |                       :::@:@ @: ::@ ::@@@ @@@ @:@@:::: :@@@@@@@@@@@@@@#
     |                   ::@@: :@:@ @: ::@ ::@@@ @@@ @:@@:::: :@@@@@@@@@@@@@@#
     |                @@@: @@: :@:@ @: ::@ ::@@@ @@@ @:@@:::: :@@@@@@@@@@@@@@#
     |           :::::@@ : @@: :@:@ @: ::@ ::@@@ @@@ @:@@:::: :@@@@@@@@@@@@@@#
     | @@:::::::@:: ::@@ : @@: :@:@ @: ::@ ::@@@ @@@ @:@@:::: :@@@@@@@@@@@@@@#
     | @ :::::::@:: ::@@ : @@: :@:@ @: ::@ ::@@@ @@@ @:@@:::: :@@@@@@@@@@@@@@#
     | @ :::::::@:: ::@@ : @@: :@:@ @: ::@ ::@@@ @@@ @:@@:::: :@@@@@@@@@@@@@@#
   0 +----------------------------------------------------------------------->Gi
     0                                                                   75.95

We do not use any profiling info, but it is collected implicitly because of the compile-time switches:

tvm/src/runtime/opencl/opencl_device_api.cc

Line 431 in 6babb89

clCreateCommandQueue(this->context, did, CL_QUEUE_PROFILING_ENABLE, &err_code));
tvm/src/runtime/opencl/opencl_module.cc

Line 84 in 6babb89

OPENCL_CALL(clEnqueueNDRangeKernel(queue, kernel, work_dim, nullptr, wl.work_size,

With the proposed modifications this behavior is changed: clCommandQueue by default is created in the normal mode and is recreated with profiling capabilities when user calls profiler explicitly. When a profiling session is finished, the queue is recreated again in normal mode, which allows to mix profile() calls and run() calls.

With the proposed changes valgrind shows no abnormal memory usage for the example above.

    MB
148.9^#                                                                       
     |#::::::::::::::::::::::::::@:::::::::::::@:::@:::::@::::::@:::::@:::::@:
     |#: ::::::::::::::: ::::::: @:::::::::::::@:::@:::::@::::::@:::::@:::::@:
     |#: ::::::::::::::: ::::::: @:::::::::::::@:::@:::::@::::::@:::::@:::::@:
     |#: ::::::::::::::: ::::::: @:::::::::::::@:::@:::::@::::::@:::::@:::::@:
     |#: ::::::::::::::: ::::::: @:::::::::::::@:::@:::::@::::::@:::::@:::::@:
     |#: ::::::::::::::: ::::::: @:::::::::::::@:::@:::::@::::::@:::::@:::::@:
     |#: ::::::::::::::: ::::::: @:::::::::::::@:::@:::::@::::::@:::::@:::::@:
     |#: ::::::::::::::: ::::::: @:::::::::::::@:::@:::::@::::::@:::::@:::::@:
     |#: ::::::::::::::: ::::::: @:::::::::::::@:::@:::::@::::::@:::::@:::::@:
     |#: ::::::::::::::: ::::::: @:::::::::::::@:::@:::::@::::::@:::::@:::::@:
     |#: ::::::::::::::: ::::::: @:::::::::::::@:::@:::::@::::::@:::::@:::::@:
     |#: ::::::::::::::: ::::::: @:::::::::::::@:::@:::::@::::::@:::::@:::::@:
     |#: ::::::::::::::: ::::::: @:::::::::::::@:::@:::::@::::::@:::::@:::::@:
     |#: ::::::::::::::: ::::::: @:::::::::::::@:::@:::::@::::::@:::::@:::::@:
     |#: ::::::::::::::: ::::::: @:::::::::::::@:::@:::::@::::::@:::::@:::::@:
     |#: ::::::::::::::: ::::::: @:::::::::::::@:::@:::::@::::::@:::::@:::::@:
     |#: ::::::::::::::: ::::::: @:::::::::::::@:::@:::::@::::::@:::::@:::::@:
     |#: ::::::::::::::: ::::::: @:::::::::::::@:::@:::::@::::::@:::::@:::::@:
     |#: ::::::::::::::: ::::::: @:::::::::::::@:::@:::::@::::::@:::::@:::::@:
   0 +----------------------------------------------------------------------->Gi
     0                                                                   83.07

src/runtime/opencl/opencl_common.h

echuraev

In general LGTM. Thanks

echuraev · 2022-05-10T16:07:25Z

@masahi could you please take a look at this PR?

* Enable profiling only when it is used explicitly * Change logic of clCommandQueue create/destroy * Update comments * Linter fix * Refactor queue create * Move queue recreation logic to function * Replace profiling flag by the queue info request * Enhance readability * Fix linter errors

srkreddy1238 · 2022-09-01T05:01:35Z

@argrento

This PR causes CLML profiling failure. Reason explained below

In general the workspaces can be accessed and shared via “device_api.opencl”. CLML integration shares the workspace created by default OpenCL and it has a reference to the command queue. Changing the command queue in between makes them invalid.

Why do we enable profiler in compilation when we don't want to profile any thing ?
Or
Is there any case where we enable profiling some thing else but not OpenCL / dynamically enable/disable profiling? In such cases we could think of having different compilation flag for OpenCL or environment varible to control at runtime launch.

At any case dynamically recreating the queue will cause issues for other components.

@masahi & @valmat07 comment pls.

argrento added 3 commits April 25, 2022 14:27

Enable profiling only when it is used explicitly

4157ea0

Change logic of clCommandQueue create/destroy

5619902

Update comments

4d22c06

argrento changed the title ~~Change of OpenCL profiling logic~~ [OpenCL] Change of OpenCL profiling logic Apr 29, 2022

Linter fix

8246261

argrento force-pushed the opencl_profiling branch from 15f2d6e to 8246261 Compare April 29, 2022 12:12

echuraev requested changes Apr 29, 2022

View reviewed changes

src/runtime/opencl/opencl_common.h Outdated Show resolved Hide resolved

echuraev reviewed Apr 29, 2022

View reviewed changes

src/runtime/opencl/opencl_common.h Outdated Show resolved Hide resolved

argrento changed the title ~~[OpenCL] Change of OpenCL profiling logic~~ [WIP] [OpenCL] Change of OpenCL profiling logic Apr 30, 2022

argrento added 3 commits May 5, 2022 13:00

Refactor queue create

d94c6d0

Move queue recreation logic to function

5a34acb

Replace profiling flag by the queue info request

eadcec2

echuraev reviewed May 6, 2022

View reviewed changes

src/runtime/opencl/opencl_common.h Outdated Show resolved Hide resolved

src/runtime/opencl/opencl_common.h Outdated Show resolved Hide resolved

src/runtime/opencl/opencl_common.h Outdated Show resolved Hide resolved

Enhance readability

0616943

echuraev approved these changes May 8, 2022

View reviewed changes

argrento changed the title ~~[WIP] [OpenCL] Change of OpenCL profiling logic~~ [OpenCL] Change of OpenCL profiling logic May 9, 2022

Fix linter errors

e3bd930

argrento force-pushed the opencl_profiling branch from b3dd41b to e3bd930 Compare May 9, 2022 11:19

masahi approved these changes May 10, 2022

View reviewed changes

masahi merged commit 0f6abea into apache:main May 10, 2022

argrento deleted the opencl_profiling branch May 12, 2022 10:08

argrento restored the opencl_profiling branch May 12, 2022 10:08

srkreddy1238 mentioned this pull request Sep 6, 2022

[OpenCLML] CLML Profiling fixes corresponding to OpenCL Timer recent … #12711

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[OpenCL] Change of OpenCL profiling logic #11180

[OpenCL] Change of OpenCL profiling logic #11180

argrento commented Apr 29, 2022 •

edited

Loading

echuraev left a comment

echuraev commented May 10, 2022

srkreddy1238 commented Sep 1, 2022 •

edited

Loading

[OpenCL] Change of OpenCL profiling logic #11180

[OpenCL] Change of OpenCL profiling logic #11180

Conversation

argrento commented Apr 29, 2022 • edited Loading

echuraev left a comment

Choose a reason for hiding this comment

echuraev commented May 10, 2022

srkreddy1238 commented Sep 1, 2022 • edited Loading

argrento commented Apr 29, 2022 •

edited

Loading

srkreddy1238 commented Sep 1, 2022 •

edited

Loading